The Dynamics Between Socioeconomic Connectedness, Cohesiveness, and Civic Engagement

This blog is the final project blog for the DH140 course.
Author

Mia Sadowski

Published

August 4, 2023

Modified

August 4, 2023

Introduction to The Project

The dataset being analyzed in the final project is the Social Capital Atlas. This dataset comes from Data for Good at Meta (who is renowned for their ownership of Facebook and Instagram) and is organized into separated .csv files to analyze various US counties, zip codes, colleges, and high schools to measure three main measurements. Firstly, the economic connectedness measures the number of shared friends between people of different or similar socioeconomic statuses, depending on the category analyzed. The second measure is the cohesiveness or how much friends tend to be supported by other’s mutual friends. Finally, the civic engagement is quantified by the measurement of how often people participate in volunteer activities publicly on social media.

Motivation

Growing up, I noticed economic disparsity within my own hometown and how that played with relationship circles. I grew up in a sub-urban beach town called Ventura, about an hour away from Los Angeles county. It was generally a quiet town; however, I noticed some economic and racial disparsity, with certain communities being separated from the rest. I especially noticed this in friend groups within public high schools. Those who were more wealthy and lived in more expensive areas typically congregated together, while those who lived in less expensive areas were excluded from these groups and instead congregated amongst themselves. To my knowledge, most of the causes for exclusion were not from direct bullying but just difference in township sections (otherwise known as where people lived) and general community expectations.

While I am unable to explore racial disparity in this dataset, I am able to see if this economic disparity affects friend groups through the country through this dataset. It will also be interesting to recognize whether higher rates of inclusion lead to higher rates of volunteer efforts, as it may assume a set of values within a community that truly sets it apart, or if it is all randomized.

Research Questions

  1. Among an adult population, how does the connectedness between individuals with varying socioeconomic statuses relate to the cohesiveness of friend groups?

  2. To what extent does higher social cohesion in a society influence rates of civic engagement?

Methods

This section will explain our data and analytical process to address our research question.

Summary Information

The Social Capital Atlas is organized through only qualitative data gathered from users on Facebook. This survey utilizes publicly available information from Facebook, to which users have granted access, encompassing data from nearly every county in the United States. While there are different spreadsheets for ZIP codes, colleges, and high schools throughout the country, this research project will analyze counties specifically for a more focused narrative. Only data of users aged between 25-44, were on Facebook at least once in the prior 30 days, have at least 100 U.S. based Facebook friends, and having a non-missing ZIP codes were included in this report. The report ensures that they use privacy protection to ensure that personal data about the individuals cannot be learned from the dataset.
Let’s now take a look at how the data is formatted.

import pandas as pd
import matplotlib.pyplot as plt
import folium

df = pd.read_csv('https://data.humdata.org/dataset/85ee8e10-0c66-4635-b997-79b6fad44c71/resource/ec896b64-c922-4737-b759-e4bd7f73b8cc/download/social_capital_county.csv')

df
county county_name num_below_p50 pop2018 ec_county ec_se_county child_ec_county child_ec_se_county ec_grp_mem_county ec_high_county ... child_exposure_county child_high_exposure_county bias_grp_mem_county bias_grp_mem_high_county child_bias_county child_high_bias_county clustering_county support_ratio_county volunteering_rate_county civic_organizations_county
0 1001 Autauga, Alabama 5922.39210 55200.0 0.72077 0.00831 1.11754 0.02467 0.77223 1.21372 ... 1.14816 1.19944 0.05526 -0.22748 0.02668 -0.08229 0.10347 0.98275 0.04355 0.01518
1 1003 Baldwin, Alabama 15458.39600 208107.0 0.74313 0.00661 0.83064 0.01629 0.76215 1.28302 ... 0.84588 1.00797 0.02950 -0.21519 0.01802 -0.05241 0.09624 0.98684 0.06117 0.01526
2 1005 Barbour, Alabama 4863.97360 25782.0 0.41366 0.00978 0.58541 0.02707 0.35927 0.91897 ... 0.63306 0.71967 0.13457 -0.34086 0.07528 -0.19714 0.14911 0.99911 0.02093 0.01474
3 1007 Bibb, Alabama 3061.49340 22527.0 0.63152 0.01175 0.72265 0.03027 0.68094 1.06378 ... 0.71433 0.72395 0.04108 -0.27727 -0.01165 -0.15993 0.14252 0.99716 0.05294 0.01439
4 1009 Blount, Alabama 6740.91160 57645.0 0.72562 0.00985 0.76096 0.02466 0.79584 1.10569 ... 0.74821 0.79375 0.00217 -0.24946 -0.01704 -0.08745 0.11243 0.99069 0.05704 0.01724
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3084 56037 Sweetwater, Wyoming 2402.96900 44117.0 0.96235 0.01280 1.14781 0.02794 1.13449 1.32399 ... 1.12164 1.12907 0.09519 -0.12030 -0.02333 -0.08683 0.10809 0.99710 0.07321 0.01225
3085 56039 Teton, Wyoming 783.24982 23059.0 1.07623 0.01744 1.23113 0.04692 1.13296 1.63551 ... 1.32874 1.35341 0.14337 -0.11958 0.07346 -0.07364 0.09253 0.98648 0.09747 0.03223
3086 56041 Uinta, Wyoming 2174.06180 20609.0 0.95452 0.01404 1.04595 0.03455 0.92831 1.32040 ... 1.05446 1.06284 0.13816 -0.12194 0.00808 -0.06074 0.11204 0.99479 0.06942 0.01222
3087 56043 Washakie, Wyoming 872.51544 8129.0 0.90667 0.01928 0.90794 0.04962 0.78223 1.29208 ... 0.88480 0.88589 0.06667 -0.20435 -0.02615 -0.06076 0.11592 0.99708 0.05843 0.03512
3088 56045 Weston, Wyoming 635.28436 7100.0 0.97840 0.02036 1.09118 0.05823 0.93135 1.28553 ... 1.03325 1.05526 0.02279 -0.17229 -0.05606 -0.04609 0.11927 0.99730 0.13635 0.02375

3089 rows × 26 columns

As mentioned, nearly every category asides from pop_2018 (which was the population in 2018, not constructed through Facebook data) measures either the connectedness, cohesiveness, or civic engagement in some form. While there are many columns, many of them will not be used for the final analysis since they are irrelevant to the research questions. Here is a summary of what columns this project will utilize, as well as explanation for why this project will not utilize the others.

ec_county - This measures the level of economic connectedness within a county. The equation for this is two times the share of high-SES (social economic status) friends amongst low-SES individuals and then averaged. This equation is from the research journal “Social capital I: Measurement and Associations with Economic Mobility.”
ec_high_county - This measures the connectedness between high-SES individuals solely, once again with a value that is averaged. The main difference is that this does not include how connected they are with people from lower economic statuses.
exposure_grp_mem_county - This measures how often high-SES individuals are exposed to low-SES individuals with the same formula of from ec_county.
exposure_grp_mem_high_county - This measures how often high-SES individuals are exposed to only high-SES individuals, once again measured with a similar formula.
clustering_county - This calculates the average fraction of an individual’s friend pairs who are also friends with each other including people only within the relevant county.
volunteering_rate_county - The percentage of Facebook users within a county that are predicted to be members of a volunteering or activism group. Secret groups or large groups that Facebook identified as “clearly misclassified” were not included in this calculation.

There are many other columns. One of the main categories include those that revolve around childhood connectedness specifically. These are not included in this research report because the research question aims to address the adult population. Additionally, the two categories, the support_ratio_county, which measures the proportion of people who share a third mutual friend, and so was civic_organizations_county, which measures the number of Facebook Pages predicted to be “Public Good” within a county, are not included as they would not address the research questions effectively. As part of the cleaning process, these categories will be removed. The standard error columns will be kept, as they may need to be referred to in the conclusion. Listed below is the code with the new columns listed in each row.

columns_removed = ['num_below_p50', 'pop2018', 'child_ec_county', 'child_ec_se_county', 'ec_grp_mem_county', 'child_high_ec_county', 'child_high_ec_se_county', 'ec_grp_mem_high_county','child_exposure_county','child_high_exposure_county','bias_grp_mem_county','bias_grp_mem_high_county','child_bias_county','child_high_bias_county', 'support_ratio_county', 'civic_organizations_county']

df = df.drop(columns=columns_removed)

column_max = df.max()
column_min = df.min()
column_stats_df = pd.DataFrame({'Column': df.columns, 'Minimum': column_min, 'Maximum': column_max})

print(column_stats_df)
                                                    Column  \
county                                              county   
county_name                                    county_name   
ec_county                                        ec_county   
ec_se_county                                  ec_se_county   
ec_high_county                              ec_high_county   
ec_high_se_county                        ec_high_se_county   
exposure_grp_mem_county            exposure_grp_mem_county   
exposure_grp_mem_high_county  exposure_grp_mem_high_county   
clustering_county                        clustering_county   
volunteering_rate_county          volunteering_rate_county   

                                                Minimum                Maximum  
county                                             1001                  56045  
county_name                   Abbeville, South Carolina  Ziebach, South Dakota  
ec_county                                       0.29469                 1.3597  
ec_se_county                                    0.00436                0.05023  
ec_high_county                                  0.70062                1.71507  
ec_high_se_county                               0.00475                0.05099  
exposure_grp_mem_county                          0.2552                1.48628  
exposure_grp_mem_high_county                    0.51013                1.66616  
clustering_county                               0.07162                0.26097  
volunteering_rate_county                        0.00965               0.308736  

Summary Statistics

This is the following cell range for each column:
ec_county - 0.29469 - 1.3597. The lower the number is, the less connectedness there is between high-SES and low-SES individuals. The standard deviation ranges from around 0.004-0.05 per county.
ec_high_county - 0.70062 - 1.71507. The lower the number is, the less connectedness there is between high-SES individuals. The standard deviation for this ranges from 0.004-0.05 per county, similar to the original ec_county statistic.
exposure_grp_mem_county - 0.2552-1.48628. The lower the number is, the less exposure people of high-SES to low-SES individuals.
exposure_grp_mem_high_county - 0.51013-1.66616. The lower the number is, the less exposure high-SES people have to other high-SES individuals.
clustering_county - 0.07162-0.26097. The lower the number equates to a lower amount of mutual friend circles within the county.
volunteering_rate_county - 0.00965 - 0.308736. The lower the number is, the less volunteers the county has.

We will also note the mean (average) values for future reference.

mean_values = df.mean(numeric_only=True)
print(mean_values)
county                          30218.783101
ec_county                           0.814464
ec_se_county                        0.013409
ec_high_county                      1.252636
ec_high_se_county                   0.014754
exposure_grp_mem_county             0.906089
exposure_grp_mem_high_county        1.078581
clustering_county                   0.116456
volunteering_rate_county            0.078068
dtype: float64

Analytical Process

This section will explain the process and visualizations to answer the research question. Before addressing the question, we should identify and understand the dataset. To help with this process the following section will go through data exploration. Essentially, we will create some visualizations to recognize any anomalies, patterns, and any further information that might need cleaning.

For this dataset that is entirely numerical and based on location, the best visualization methods to explore this dataset will be through scatterplots, histograms, and heat maps. These also happen to be the best type of visualizations to do further analysis, however, we will look specifically into the following questions before investigating further.

    1) Is there any relationship between economic connectedness between low and high SES and just high SES? If so, what’s the pattern? Do those categories     seem to have a balanced distribution, or are there more counties that are on the lower or higher end of the spectrum?
    2) What do the levels of clustering_county, or fractions of friend pairs, look like around the country? Is there any clear average?
    3) What do the levels of volunteering rates look like around the country? Is there any clear average?

After we have a better understanding of what this looks like, we will then investigate more into the research questions. This will involve creating visualizations that explore more of the direct connection between these three factors of connectedness, cohesiveness, and civic engagement, rather than focusing on each factor individually as we did in the exploration stage. Then, we can move onto our discussion and analyze how this connects with the research question.

Analysis

Exploration Stage

Data Exploration #1

df.plot.scatter(x='ec_county', y='ec_high_county')
plt.xlabel("Economic Connectedness between low and high SES")
plt.ylabel("Connectedness within high SES")
plt.title("Relationship between Economic Connectedness of Low and High SES and Economic Connectedness within High SES")
plt.show()

This scatterplot aims to explore the potential correlation between connectedness among individuals from low and high SES backgrounds, as well as among those exclusively from high SES individuals.

It seems there is a somewhat balanced, upward linear pattern between the two indicators. For the most part, the higher the economic connectedness between low and high SES indicates it’ll also be higher between those in just higher SES. Although a few outliers are present, they do not strongly deviate from the overall trend of the data, generally aligning with the previously predicted standard deviation.

It is important to note that the majority of outliers predominantly cluster towards the upper end of the high-SES connectedness spectrum. This could imply that many counties have a tendency to prioritize connectedness within one’s own social class rather than a diverse connections. It’ll be interesting to see if this correlates with friendship circles as we get into the analysis.

While the scatterplot indicates a relatively even distribution of economic connectedness country-wide, a histogram should be conducted to double check on the categories.
### Data Exploration #2

plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)  
plt.hist(df['ec_county'], bins=10, range=(0.2, 1.4), edgecolor='black')
plt.xlabel("Economic Connectedness per ec_county")
plt.ylabel("Frequency")
plt.title("Histogram of Economic Connectedness Between Low and High-SES")

plt.subplot(1, 2, 2)  
plt.hist(df['ec_high_county'], bins=10, range=(0.7, 1.7), edgecolor='black')
plt.xlabel("Economic Connectedness per ec_high_county")
plt.ylabel("Frequency")
plt.title("Histogram of Economic Connectedness for only High-SES")

plt.subplots_adjust(wspace=0.4)
plt.show()

The original prediction of even distribution across the economic connectedness scores is incorrect.

The economic connectedness between low and high SES appears to be on somewhat a bell-curve, which means that there is higher frequency in the mid-range of the distribution with a score of around 0.8. This also happens to be the average or mean score. In simpler terms, this means that counties tend to have a moderate level of connectedness between people of different socioeconomic status, and fewer ones have either high levels or low levels.

On the other hand, the economic connectedness amongst only those of high-SES is slightly skewed to the higher range of the data. This means that those within the higher socioeconomic category tend to have stronger connections. This might also explain why in the scatterplot most of the deviations were indicated higher connectedness with only high-SES individuals.

Now that the first question is fully addressed, we will now analyze the clustering rates around the country. ### Data Exploration #3

def extract_county_name(name): ## couldn't find a .geojson that had the counties lists as (County Name, State Name), like the dataset, hence the function
    parts = name.split(',')
    if len(parts) > 1:
        return parts[0].strip()  
    else:
        return name
    
df['county_name'] = df['county_name'].apply(extract_county_name)
county_geo_url = "https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_20m.json"
m = folium.Map(location=[36, -98], zoom_start=3)


title_html = '''
<h1 align="center" style="font-size:18px"><b>Cohesiveness Rates around the Country</b></h1>
<p align="center">Cohesiveness is calculated through data regarding clustering_county.</p>
'''

m.get_root().html.add_child(folium.Element(title_html))
folium.Choropleth(
    geo_data=county_geo_url,
    data=df,
    columns=['county_name', 'clustering_county'],
    key_on='feature.properties.NAME',  
    fill_color='Spectral',
).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

Before analyzing this dataset, it is clear that some counties are missing as they are colored in black. Before analysis, we will do some extra cleaning of the dataset to make sure we attain the most accurate results possible and ensure that all counties included are represented.

Data Cleaning

We will first identify whether there are any rows with missing values. If yes, this would mean that the dataframe does not exactly match the names in the .geojson. Otherwise, this would mean that the values did not correctly match up with the .geojson and need to be remapped.

missing_values = df[df.isnull().any(axis=1)]
print(missing_values)
      county         county_name  ec_county  ec_se_county  ec_high_county  \
52      1105               Perry        NaN           NaN             NaN   
71      2060         Bristol Bay        NaN           NaN             NaN   
82      2164  Lake and Peninsula        NaN           NaN             NaN   
186     6003              Alpine        NaN           NaN             NaN   
255     8023            Costilla        NaN           NaN             NaN   
...      ...                 ...        ...           ...             ...   
2707   48433           Stonewall        NaN           NaN             NaN   
2713   48447        Throckmorton        NaN           NaN             NaN   
2748   49009             Daggett        NaN           NaN             NaN   
2759   49031               Piute        NaN           NaN             NaN   
2831   51091            Highland        NaN           NaN             NaN   

      ec_high_se_county  exposure_grp_mem_county  \
52                  NaN                      NaN   
71                  NaN                      NaN   
82                  NaN                      NaN   
186                 NaN                      NaN   
255                 NaN                      NaN   
...                 ...                      ...   
2707                NaN                      NaN   
2713                NaN                      NaN   
2748                NaN                      NaN   
2759                NaN                      NaN   
2831                NaN                      NaN   

      exposure_grp_mem_high_county  clustering_county  \
52                             NaN           0.198011   
71                             NaN           0.133101   
82                             NaN           0.156810   
186                            NaN           0.091637   
255                            NaN           0.100376   
...                            ...                ...   
2707                           NaN           0.133058   
2713                           NaN           0.105239   
2748                           NaN           0.094783   
2759                           NaN           0.123849   
2831                           NaN           0.152162   

      volunteering_rate_county  
52                    0.031368  
71                    0.042672  
82                    0.086069  
186                   0.146537  
255                   0.062568  
...                        ...  
2707                  0.033803  
2713                  0.244292  
2748                  0.026229  
2759                  0.030337  
2831                  0.188111  

[77 rows x 10 columns]

While the ec_county is missing values, none of the values are missing from clustering_county or volunteering_rate_county. Clustering_county was the map that was missing data. This means certain values did not match up with the .geojson file. We will check for these values.

import requests
response = requests.get(county_geo_url)
county_geo_data = response.json()
geojson_names = set(feature['properties']['NAME'] for feature in county_geo_data['features'])
df_names = set(df['county_name'])
missing_names = df_names - geojson_names
print("Missing county names:", missing_names)
Missing county names: {'St. Louis City', 'Bedford County', 'DoÃ\x83±a Ana', 'Baltimore City', 'Roanoke County', 'Richmond County', 'Fairfax County', 'St. Louis County', 'Roanoke City', 'Ste Genevieve', 'Baltimore County', 'Richmond City', 'Franklin County'}

These are the county names that weren’t addressed in the .geojson.

Many of these have different names in the main dataset. Here are some reasons why they do not match up.
    1) Counties like “Richmond County” that have no city are simply just named “Richmond” in the .geojson. Differentiating between a city and county is not indicated in the name of the geojson.
    2) Some versions of the spelling of a county, such as Ste Genevieve have periods in different placements. It may simply also just be messed up due to odd encoding (e.g. DoÃ83±a Ana was supposed to be Doña Ana).

We will fix all of these through a name mapping process, and then verify to make sure we do not have any more missing names.

name_mapping = {
    "0500000US51161": {"NAME": "Roanoke County"},
    "0500000US51770": {"NAME": "Roanoke City"},
    "0500000US24510": {"NAME": "Baltimore City"},
    "0500000US29189": {"NAME": "St. Louis County"},
    "0500000US51059": {"NAME": "Fairfax County"},
    "0500000US24005": {"NAME": "Baltimore County"},
    "0500000US51019": {"NAME": "Bedford County"},
    "0500000US35013": {"NAME": "DoÃ\x83±a Ana"},
    "0500000US51159": {"NAME": "Richmond County"},
    "0500000US51760": {"NAME": "Richmond City"},
    "0500000US29510": {"NAME": "St. Louis City"},
    "0500000US29186": {"NAME": "Ste Genevieve"},
    "0500000US51067": {"NAME": "Franklin County"},
}

def update_geojson_feature_properties(feature, properties):
    feature['properties'].update(properties)

for feature in county_geo_data['features']:
    census_area = feature['properties'].get('GEO_ID', '')
    if census_area in name_mapping:
        update_geojson_feature_properties(feature, name_mapping[census_area])

geojson_names = set(feature['properties']['NAME'] for feature in county_geo_data['features'])
df_names = set(df['county_name'])
missing_names = df_names - geojson_names

if missing_names:
    print("Missing county names:", missing_names)
else:
    print("There are no more missing county names.")
There are no more missing county names.

With all county names accounted for, we can now proceed with our map analysis.

It’s important to acknowledge that there will be some missing counties in the map despite our name_mapping. This is most likely due to dataset limitations, as the authors noted that counties with less than 100 low-SES and 100 high-SES individuals were omitted for privacy purposes. However, re-mapped countries, such as Baltimore County, will now be filled in.

m = folium.Map(location=[36, -98], zoom_start=4)

title_html = '''
<h1 align="center" style="font-size:18px"><b>Cohesiveness Rates Around the Country</b></h1>
<p align="center">Cohesiveness is calculated through data regarding clustering_county.</p>
'''
m.get_root().html.add_child(folium.Element(title_html))

folium.Choropleth(
    geo_data=county_geo_data,
    data=df,
    columns=['county_name', 'clustering_county'],
    key_on='feature.properties.NAME',  
    fill_color='Spectral',
).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

In the mainland of the United States, most counties tend to be red, orange, or yellow — which is quite low on the full scale for cohesiveness. Light green hues, though scarce, tend to appear mostly in the southwest and midwest regions. This analysis suggests that the rate of cohesiveness throughout the country is quite low, but another histogram will be essential to confirm this.

Notably, the higher frequencies of cohesiveness only occur in Alaska. Initially, one might assume this might have correlation with population, especially when comparing this map to a map from the World Population Review, it happens to be in their less populated census areas that cohesiveness is most frequent. However, Nome Census Area, which is one of the few areas (Alaska has no counties) highlighted in blue, has a population of 9,825.¹ McMullen County in Texas, one of the smallest counties in the United States, has a population of only 576 as of 2022² yet is still colored orange, meaning its cohesiveness score is still quite low. Evidently, population alone does not effect cohesiveness, although it could play a small role.

Additionally, a significant portion of California is highlighted in red, meaning its cohesiveness is quite low. Specifically, the entire region of Southern California, as well as some counties nearby the coast up until the Bay Area is red.

We will now analyze the histogram to confirm its frequency.

Data Exploration #4

df.hist(column='clustering_county')
plt.xlabel("Frequency of Clusters / Mutual Connections Within Every County")
plt.ylabel("Frequency")
plt.title("Frequency of Relationship Cohesiveness Within Every County")
plt.show()

As predicted, clusters tend to be less frequent within each county with the histogram heavily skewed to the left. We’ll now conduct a similar analysis with civic engagement.

Data Exploration #5

m = folium.Map(location=[36, -98], zoom_start=3)

title_html = '''
<h1 align="center" style="font-size:18px"><b>Civic Engagement Rates around the Country</b></h1>
<p align="center">Cohesiveness is calculated through data regarding volunteering_rate_county.</p>
'''
m.get_root().html.add_child(folium.Element(title_html))

folium.Choropleth(
    geo_data=county_geo_data,
    data=df,
    columns=['county_name', 'volunteering_rate_county'],
    key_on='feature.properties.NAME',  
    fill_color='Spectral',
).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

Across the mainland, a prevalent observation is that many of the states seem to be red or orange, with a few more ones in the green or blue range compared to cohesiveness. Nonetheless, this means that there is relatively low volunteering rates. An important note is that the west (except for California), midwest, and northeast tend to have more counties that average in the orange range. No state seems to particularly show higher rates of volunteering than average except for Alaska.

This should also be confirmed with a histogram, but from this map it seems that the rate will be similarly skewed as the histogram for cohesiveness rate.
### Data Visualization #6

df.hist(column='volunteering_rate_county')
plt.xlabel("Frequency of Clusters / Mutual Connections within every County")
plt.ylabel("Frequency")
plt.title("Frequency of Relationship Cohesiveness within every County")
plt.show()

The assumption made above is confirmed. We can now move onto our explanatory stage where we narrow down our visualizations to create a specific focus into the research question.

Explanatory Stage

Following the initial assessment of the data, it is likely that connectedness does not correlate to cohesiveness. Cohesiveness around the country seems to be heavily skewed, while connectedness is more balanced. It is also worth investigating whether exposure to low SES and high SES individuals increases connectedness. Such a finding would suggest that connectedness is primarily influenced by exposure rather than being solely a matter of cultural emphasis.

However, higher social cohesion does likely lead to higher rates of civic engagement. This is because it builds more of a community, and as we seen in areas like Alaska, both rates of cohesion and civic engagement tend to be higher.

To begin, let’s first see if there’s any relationship between exposure and connectedness.

Data Visualization #1

df.plot.scatter(x='ec_county', y='exposure_grp_mem_county')
plt.xlabel("Economic Connectedness between low and high SES")
plt.ylabel("Exposure of Population to Low and High SES")
plt.title("Relationship between Exposure to Connection between low and high SES")
plt.show()

There is a strong linear correlation between exposure and connectedness. Counties that experience greater exposure to diverse socioeconomic groups tend to exhibit higher levels of connectedness. There are some outliers that mostly consist of points that have higher exposure but lower connectedness than usual. This finding is a key finding as it suggests connectedness is predominantly influenced by the extent of exposure, rather than being rooted primarily in a county’s culture influences. Otherwise, there would be much less of a linear correlation and more frequent outliers.

This finding further decreases the likelihood of a correlation between cohesiveness and connectedness. Cohesiveness is the degree of a social network’s cliques and the integration of mutual friends, which is a decision that people make by choice. Exposure is not a choice, people are naturally prone to it simply by moving into a certain county. These distinct motivational factors make it less probable for a causal relationship to exist between the two.

Data Visualization #2

df.plot.scatter(x='ec_county', y='clustering_county')
plt.xlabel("Economic Connectedness between low and high SES")
plt.ylabel("Level of Cohesiveness")
plt.title("Relationship between relationship connectedness between SES and overall cohesiveness in a county")
plt.show()

Confirming our initial assumption, the graph validates that no linear correlation exists, and it reveals the presence of numerous outliers. Many outliers are on the lower than average side of connectedness, but higher than average levels of cohesion or the converse. A significant proportion of these cases demonstrate lower cohesiveness levels along the economic connectedness spectrum. This suggests that there is little relation; however, certain counties with cultures of economic hierarchy may be higher. We will identify whether this statement could be valid by conducting another scatterplot that measures the connectedness exclusively between those of high SES instead.

Data Visualization #3

df.plot.scatter(x='ec_high_county', y='clustering_county')
plt.xlabel("Economic Connectedness of those exclusively high SES")
plt.ylabel("Level of Cohesiveness")
plt.title("Relationship between Relationship Connectedness between Exclusively High SES and Overall Cohesiveness in a county")
plt.show()

This chart looks quite similar to the previous one. While it does start to establish a slightly more linear trend than above, it is not significant enough to say there is any correlation. Additionally, many of the outliers have the same pattern of having low connectedness but high cohesion and conversely, so it disproves the economic hierachy theory.

With this, the economic hierarchy theory loses support. Our focus now turns to the analysis of the relationship between cohesiveness and civic engagement.

Data Visualization #4

df.plot.scatter(x='clustering_county', y='volunteering_rate_county')
plt.xlabel("Clusters Within a County")
plt.ylabel("Volunteering Rate Within a County")
plt.title("Relationship between Cohesiveness and Civic Engagement")
plt.show()

Unlike the argument proposed earlier, this scatterplot displays the weakest correlation among all the examined variables in this report. There is no linear pattern present and most of the points are near the lower left quadrant of the graph. This would mean that increased cohesiveness does not increase civic engagement by any means. In fact many of the outliers seem to be on opposite ends of the spectrum, meaning that cohesiveness is high but civic engagement is low, or civic engagement is low but cohesiveness is high.

We will do a final analysis of this point in a heatmap to potentially uncover clusters or trends that warrant further scrutiny and discussion for this point.

Data Visualization #5

m = folium.Map(location=[36, -98], zoom_start=3)

title_html = '''
<h1 align="center" style="font-size:18px"><b>Correlation between Civic Engagement and Cohesiveness</b></h1>
<p align="center">Utilize the legend near the top right to toggle between different layers.</p>
'''
m.get_root().html.add_child(folium.Element(title_html))

folium.Choropleth(
    geo_data=county_geo_url,
    data=df,
    columns=['county_name', 'clustering_county'],
    key_on='feature.properties.NAME',
    fill_color='YlOrRd',
    name='Cohesiveness Levels', 
).add_to(m)

folium.Choropleth(
    geo_data=county_geo_url,
    data=df,
    columns=['county_name', 'volunteering_rate_county'],
    key_on='feature.properties.NAME',
    fill_color='YlOrRd',
    name='Civic Engagement Levels', 
).add_to(m)



folium.LayerControl().add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

With the exception of Alaska, most areas seems to share a general pattern of having low cohesiveness and volunter rates, overall having no behavior that particularly stands out. While there are some counties that stand out for having an abnormally high civic engagement rate compared to its surrounding area, such as Niobrara County in Wyoming, these counties do not have additional research or information about them that could be researched for the discussion.

Discussion

Key Findings

The research question aimed to address connections between connectedness and cohesiveness along with the relationship between cohesiveness and civic engagement. According to the Social Capital dataset and the generated data visualizations in this report, there is no direct correlation between any of these categories. Cohesiveness remains generally low even as connectedness increases. Both civic engagement and cohesiveness tend to stay low for most counties with the exception of some outliers. Neither of them particularly cause one or the other to change.

Interpretation

The outcome of the analysis did not meet the initial expectations for certain points. While it was expected that connectedness and cohesiveness would not be correlated due to their distinct motivational drivers, it was expected that cohesiveness and civic engagement would have a connection as they are voluntarily performed actions.

One plausible reason that might explain the connectedness and cohesiveness is that race might play a more significant role in this scenario. In a intercultural research study conducted in the Netherlands, researchers found that wealthy Dutch individuals had a stronger preference for friendships that had cultural similarities; therefore, they had fewer interethnic friendships. Higher SES non-Western minority members on the other hand, craved more interethnic friendships.³ This resonates with observations within the United States, as there is often a racial divide for socioeconomic classes within neighborhoods. An example of this is in Ventura as mentioned in the beginning of the report. A research report identified that a greater percentage of minorities, such as Hispanic/Latino, African American, and Asian members, lived in areas where there was toxicity-weighted pesticide use.⁴

Regarding cohesiveness and civic engagement, the absence of a direct relationship could stem from cohesiveness primarily measuring the existence and inclusion within social cliques, without delving into the depth of relationships or social dynamics. When investigating civic engagement, researchers found that when people felt a stronger connection and relationship to their neighborhood, they were more likely to desire wanting to participate in volunteer work.⁵ Cohesiveness does not include only strong relationships, if it did, this would likely result in different results.

An intriguing exception was Alaska as it was the only region where there was stronger rates in cohesiveness and civic engagement throughout a majority of its counties. The Foraker Group, a non-profit organization in Anchorage, Alaska proudly presents that it is 5th in the nation for volunteerism.⁶ Considering they have the third to least population in the United States, this is significant.⁷ Nearly half of Alaskan residents donate to charity, and they estimate about a third of adults serve as board members for a nonprofit. While it is harder to find secondary literature that connects their cohesiveness, this explains why their civic engagement rates are particularly higher than most other states. Considering a lot of their population participates, there might be a higher emphasis on community within the state’s culture, hence the higher rates.

Implication

Considering these interpretations, there are a few implications for societal understanding. For starters, research articles have previously recognized socioeconomic segregation in friendship networks, romantic partners, neighborhoods, education, and workplaces.⁸ Yet, through this study, it is suggested that the divide depends on other factors such as race and cultural differences. This is crucial when addressing the incessant wealth gap within American society.

Additionally, to increase civic engagement in society, the fostering of a strong community is needed. Since there is no correlation between cohesiveness and civic engagement, a strong community is not one in which many people are connected and involved, rather, it is one that can establish a powerful identity and inspire others to become a part of it. A common phenmonen and experience in many people’s lives is to meet and network with as many people as one can, but this report proves that not might always be the case for personal growth and might be more superficial.

Limitation

This research report was severely limited by the fact that this dataset was acquired through Facebook data. Relying on Facebook data has its constraints, as it may not fully capture the strength of relationships or accurately represent offline engagement. Additionally, civil engagement is measured by the existence of Facebook groups, while it is likely that many users do not establish their membership of volunteer groups online. Many users might also befriend users that they know of but are not quite connected to on a platform like Facebook simply because they like the numbers. Hence, reports may be skewed for results of civic engagement and connectedness. Combining self-reported data with Facebook-derived data could offer a more comprehensive perspective.

Conclusion

This study establishes that there is no direct correlation simply between SES connectedness and cohesiveness nor a correlation between cohesiveness and civil engagement.

Future studies could explore the impact of close relationships onto volunteer rates versus relationships of acquitances. Investigating other factors among the counties, such as racial, cultural, and generational diversity, could enrich the understanding of the subject and its intertwining dynamics.

While this research study established no correlation, it lays the foundation for more focused and specialized research upon the impacts of social capital.